A literature-driven method to calculate similarities among diseases

نویسندگان

  • Hyunjin Kim
  • Youngmi Yoon
  • Jaegyoon Ahn
  • Sanghyun Park
چکیده

BACKGROUND "Our lives are connected by a thousand invisible threads and along these sympathetic fibers, our actions run as causes and return to us as results". It is Herman Melville's famous quote describing connections among human lives. To paraphrase the Melville's quote, diseases are connected by many functional threads and along these sympathetic fibers, diseases run as causes and return as results. The Melville's quote explains the reason for researching disease-disease similarity and disease network. Measuring similarities between diseases and constructing disease network can play an important role in disease function research and in disease treatment. To estimate disease-disease similarities, we proposed a novel literature-based method. METHODS AND RESULTS The proposed method extracted disease-gene relations and disease-drug relations from literature and used the frequencies of occurrence of the relations as features to calculate similarities among diseases. We also constructed disease network with top-ranking disease pairs from our method. The proposed method discovered a larger number of answer disease pairs than other comparable methods and showed the lowest p-value. CONCLUSIONS We presume that our method showed good results because of using literature data, using all possible gene symbols and drug names for features of a disease, and determining feature values of diseases with the frequencies of co-occurrence of two entities. The disease-disease similarities from the proposed method can be used in computational biology researches which use similarities among diseases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مکانیابی خطاهای پنهان نرم افزار با استفاده از آنتروپی متقاطع و مدلهای n-گرام

The aim is to automate the process of bug localization in program source code. The cause of program failure could be best determined by comparing and analyzing correct and incorrect execution paths generated by running the instrumented program with different failing and passing test cases. To compare and analysis the execution paths, one approach is clustering the paths according to their simil...

متن کامل

Application of the Genetic Algorithm to Calculate the Interaction Parameters for Multiphase and Multicomponent Systems

A method based on the Genetic Algorithm (GA) was developed to study the phase behavior of multicomponent and multiphase systems. Upon application of the GA to the thermodynamic models which are commonly used to study the VLE, VLLE and LLE phase equilibria, the physically meaningful values for the Binary Interaction Parameters (BIP) of the models were obtained. Using the method proposed in t...

متن کامل

The Language of Two Homologous Aesthetic Notions: Sufism and Surrealism

There are hypotheses in the history of human culture and civilization in which we can find undeniable similarities and commonalities among them, despite their vastly different cultural and historical contexts. For example, finding commonalities between the language of two artistic aesthetic hypotheses, Sufism and Surrealism, which are very different from each other in terms of context, time as ...

متن کامل

Treatment and prevention of acute respiratory infections among Iranian hajj pilgrims: a 5-year follow up study and review of the literature

  Background: Respiratory diseases/syndromes are the most common causes of referring to physicians among pilgrims in Hajj . They lead to high morbidity , impose high costs on the health system and are among the major obstacles for pilgrims to perform Hajj duties. The main aim of our study was to determine types, frequencies, etiologies, and epidemiologic factors of respiratory diseases among Ir...

متن کامل

نمونه‌گیری پاسخگو محور در مقایسه با سایر روش‌های نمونه‌گیری از جوامع پنهان

Sampling hidden populations is challenging due to the lack of convenience statistical frames. Since most populations exposed to special diseases are hidden and hard to reach, sampling methods that produce representative and efficient samples from the populations have become a study subject for researches all over the world. Because of the unknown probability of selecting samples in conventional...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer methods and programs in biomedicine

دوره 122 2  شماره 

صفحات  -

تاریخ انتشار 2015